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(57) Abstract 



The present invention is an audio transceiver (20) which on the receiving side (20B) controls the amount of audio data in the buffer 
(I5B) of a PC audio device (14B), such that the audio device (14B) always has ioroething to play. On the transmission side (20AX the 
audio transmitter provides at least sequence numbers to the audio packets to be sent The audio receiver is concerned only with the state 
of the buffer of the audio device. Therefore, the audio transmitter does not have to be synchronized with the audio receiver. 
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AN AUDIO TRANSCEIVER 

FIELD OF THE INVENTION 

The present Invention relates to apparatus for providing real-time or near real- 
time communication of audio signals via a data network. 

BACKGROUND OF THE INVENTION 

Data networks transfer data, typically in the form of packets which usually have 
a fixed number of bytes of data, from one workstation to another. There are many types of 
network protocols by which networks setup communication paths. Ethernet and Token Ring 
are examples of low level network structures for packet data networks. 

Regardless of the type of structure used, no network Instantaneously provides 
packets from a source workstation to a destination one. There Is a transmission delay which 
typically varies depending on the load on the network (i.e. how many workstations are trying 
to send at once) and/or on the configuration of the network (i.e. which path the packet takes) 

and type of protocol used. 

If two, sequential packets take two different paths through the network, it is 
possible that they will arrive at the destination workstation after different amounts of time 
traveling through the network. They also might possibly arrive in the wrong order. Since 
most data transmitted over a network is transmitted for storage purposes, the delays and 
mixed up order are not critical, although It is always desirable to reduce them to a minimum. 

Audio devices, which convert analog audio signals to digital ones, are known. 
These devices sample the analog audio signal, at some sampling frequency, to produce a 
digital datastream and then compress the datastream to reduce the storage or bandwidth 
requirements for storing or transmitting the datastream. The datastream can then be divided 
into packets and transmitted along a network, to be reassembled and played by the 
destination workstation. The playing involves converting the packets into the datastream 
which is then converted back into an analog signal. As is known in the art, digital to analog 
conversion also Involves a converting frequency which is typically the same frequency as the 
sampling frequency of the audio device. 
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If the audio signal is to be stored by the destination workstation, then the 
delays and changed sequence are not critical. When the audio signal is retrieved from 
storage and played, it will be played smoothly since all of its packets are present in the 
storage medium. 

However, if a real-time conversation is desired, an "audio packer should be 
piayed by the audio device as soon as it arrives. This is cfifftcult when working over a network 
for exactly the reasons described hereinabove; the packet order is not necessarily maintained 
during transmission and there is a network delay which is not of a fixed value. Furthermore, 
even if the delays are overcome, if the audio device of the source workstation has a sampling 
frequency which is different (faster or slower) than the converting frequency of the aucfio 
device of the destination workstation, the two cards will not be synchronized. If the source 
workstation samples at a higher frequency, the destination workstation wBI not be able to play 
the packets fast enough. Conversely, if the source workstation samples at a lower frequency, 
the destination workstation will not have enough packets to (day. 

The following two articles discuss the issues involved in providing aucfio 
communication over a packet data network: 

Clifford J. Weinstein and James W. Forgie, "Experience with Speech 
Communication in Packet Networks - , IEEE Journal on Selected Areas in Communications 
Vol. SAC-1, No. 6, December 1983, pp. 963 - 980; and 

Warren A. Montgomery, Techniques for Packet Voice Synchronization*, IEEE 
Journal on Selected Areas In Communications. Vol. SAC-1, No. 6, December 1983, pp. 1022 
-1028. 

The first article discusses network protocols for transmitting speech. The 
second article discusses a packet voice receiver unit which chooses a target playout time for 
each packet The playout time is a fixed interval after its production by the source 
workstation- The packet is played only if it arrives before its target playout time. The second 
article also discusses a number of methods for determining the delay encountered by a 
packet due to the network. Since the second article assumes that the two audio devices are 
almost synchronized (Le. their frequencies are very close) and that speechbursts are short, 
it increases the target playout time to compensate for the lack of synchronization. 

The second article also discusses adaptively changing the target playout time, 
typically during silent periods. It can also change the target playout time during playout, 
although the article mentions that changing the playout time during playout requires 
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maintaining the pitch of the speech. Finally, the second article (fiscusses the impact of 
synchronization techniques on network design. 

Programs for enabling audio communication over networks of similar types of 
workstations are known. For example, the programs NetFone and Vtalk are designed to send 
voice signals over a data network; however, these programs work only between workstations 
manufactured by Sun Microsystems, Inc. of USA. 

A voice communication system over a network running the Ethernet protocol 
is commercially available from Genisys Comm Inc. of Rome, New York, USA. This system 
works with personal computers (PCs). 
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SUMMARY OF THE PRESENT INVENTION 

It is an object of the present invention to provide an audio transceiver between 
a personal computer (PC) and a packet data network. 

The present invention is an audio transceiver (having an aucfio receiver and 
a transmitter) which, on the receiving side, adaptively controls the amount of aucfio data in the 
buffer of a PC audio device, such that the audio device always has something to play. On 
the transmission side, the aucfio transmitter provides at least sequence numbers to the aucfio 
packets to be sent The audio receiver receives the aucfio packets, processes them and plays 
them as soon as possible thereafter. 

Since the present invention does not measure the amount of time it took for 
the aucfio packets to come, the aucfio receiver and transmitter can be placed at the ends of 
any size network (one with a short delay or one with a long delay). 

In adcfition, the audio receiver is concerned only with the state of the buffer of 
the aucfio device. Therefore, the audio transmitter does not have to be synchronized with the 
audio receiver. Their clocks can be slightly or significantly different; the aucfio receiver can 
handle both situations. 

Specifically, in accordance with a preferred embocfiment of the present 
invention, the audio transceiver includes, apparatus for sequence-stamping outgoing aucfio 
packets received from the audio device and, on input, apparatus for receiving a stream of the 
sequence-stamped audio packets from the packet data network, fullness setting apparatus 
and fullness adjusting apparatus. The fullness setting apparatus transfers a silence buffer to 
a playback buffer of the audio device whenever the playback buffer is empty. The fullness 
adjusting apparatus adaptively controls the fullness of the playback buffer to generally match 
the playout rate of the audio device with the rate at which the audio packets are received. 

In addition, in accordance with a preferred embocfiment of the present 
invention, the transceiver includes apparatus for sequence- and destination-stamping all of 
the audio packets and apparatus for transmitting the audio packets via the network. 

Moreover, in accordance with a preferred embodiment of the present invention, 
the transceiver includes sound detection apparatus which receives audio packets from the first 
aucfio device, which determines when the aucfio packets begin to contain sound and which 
sends the audio packets from the beginning of the sound. 

Still further, in accordance with a preferred embocfiment of the present 
invention, the packet data network is a private or, alternatively, a public network. 
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Addtionally, in accordance wfth a preferred embocfiment of the present 
invention, the fullness setting apparatus includes apparatus for increasing an adjustable 
fullness level The adjusting apparatus Includes apparatus for decreasing the adjustable 
fullness level and apparatus for processing aucfio data within the audio packets in order total 
5 the playback buffer to the current value of the fullness level. 

Moreover, in accordance with a preferred embodiment of the present invention, 
the apparatus for processing Includes apparatus for determining the amount of data in the 
playback buffer during a predetermined window of time. The apparatus for processing 
typically includes apparatus for adtfng and removing portions of the audio data as a function 
10 whether or not the current amount of data is less or more than the current value of the 
fullness level. 

Further, in accordance with a preferred embocfiment of the present invention, 
the apparatus for adding and removing includes apparatus for maintaining the size of the 
portions of audio data until the current amount of data reaches the current value of the 
15 fullness level. 

Finally, the present invention includes a method for processing aucfio data 
which includes the actions performed by the elements described hereinabove. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

The present invention will be understood and appreciated more fully from the 
following detailed description taken in conjunction with the drawings in which: 

Fig. 1 is a schematic illustration of a plurality of workstations connected 
together via a network, wherein each workstation has an audio transceiver constructed and 
operative in accordance with a preferred embodiment of the present invention; 

Fig. 2 is a block diagram illustration of the elements of the audo transceiver 
shown in Fig. 1; 

Fig. 3 is a schematic illustration of the transmitter portion of the transceiver of 
Fig. 2 in conjunction with the audo device; 

Fig. 4 is a block cfiagram illustration of receiver elements of the audo 
transceiver responsible for reassembfing the audo packets received from the network such 
that they can be played in real-time; and 

Fig. 5 is a flow chart illustration of the method performed by the receiver of Fig. 

4. 



6 



WO 96/15598 



PCT/CS93/14123 



DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT 

Reference is now made to Fig. 1 which illustrates a network having audio 
communication via a plurality of aucfio transceivers, constructed and operative in accordance 
5 with a preferred embodiment of the present invention, and to Fig. 2 which illustrates, in 
general block dagram format the elements erf one audio transceiver of the present invention. 

Fig. 1 illustrates a plurality of workstations 10 connected together via a packet 
data network 12. The data network 12 can be any type of network, such as a local area 
network (LAN) or a wide area network (WAN), and it can run any desired network protocol, 

10 such as SPX/IPX, TCP/IP, etc Each workstation 10 is formed of a personal computer (PC) 
having an aucfio device 14 and a network device 19. The network device 19 connects its 
workstation 10 to the network 12. The aucfio device 14 is connected to a speaker 16 and a 
microphone 18 and is operative to play digitally recorded sound on the speaker 16 and to 
convert sound from the microphone 18 to a cfigital signal. Typical aucfio devices 14 have 

15 playback buffers 15. 

The audio transceivers 20 of the present invention bridge between the audio 
devices 14 and the network devices 19 so that two workstations 10 can provide sound to 
each other in real* or near real-time, this enabling the users at the two workstations 10 to 
have a reasonable voice conversation with each other. 

20 As will be described in more detail hereinbelow, the audio transceiver 20 has 

an aucfio receiver and an aucfio transmitter. On the transmission side, the audio transmitter 
converts the audio datastream to packets and provides at least sequence numbers to the 
packets. On the receiving side, the audio receiver receives the aucfio packets and, in 
accordance with a preferred embocfiment of the present invention, adaptively controls the 

25 amount of audio data in the playback buffer 15 of the audio device 14 to maintain a desired 
fullness level. 

Fig. 2 illustrates the general structure of two audio transceivers 20, a source 
transceiver 20a and a destination transceiver 20b. The explanation of the general operation 
of the audio transceiver 20 will be provided herein in the context of a conversation between 
30 transceivers 20a and 20b. 

Each transceiver comprises a network interface 30, a call manager 32, an 
audio manager 34, and a session manager 33, where the elements of the source transceiver 
20a are labeled with an 'a' suffix and those of the destination transceiver are labeled with a 
'b' suffix. The network interfaces 30 cfivide the audio datastream into packets and, via the 
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network device 10, the network interfaces 30 connect to the network 12. The network 
interfaces 30 know the addresses of the workstations on the network and serve to connect 
their audio transceivers 20 to the desired destination workstation. 

Through the source caD manager 32a, the operator indicates with whom he 
wants to talk. The source call manager 32a converts the name of the person to the address 
of the workstation at which the person works and prepares a "call initiation" message (a data 
message) to that destination workstation. The source network manager 30a sends the call 
initiation message. The destination network manager 30b receives the call initiation message 
and provides it to Its caD manager 32b which, in turn, indteates to its operator that a call is 
being initiated. This indication can be via the display of the destination PC or by making a 
'call initiation" sound, such as that of a beO, on the destination audio device. TypicaUy, the 
destination call manager 32b also indicates to the operator who initiated the call. 

The operator. If he wishes to talk to the person who initiated the call, makes 
an appropriate indication to the destination call manager 32b. In response, the destination 
call manager 32b sends an "OK to talk" message, through its network interface 30b, to the 
source audio transceiver 20a. The "OK to talk" message also indteates to the destination 
network interface 30b that further messages (which wHI contain audio data) are to be sent to 
its audio manager 34b. 

The source network Interface 30a, upon receipt of the "OK to talk" message, 
sends It to the source call manager 32a which, in turn, may provide an appropriate indication 
to Its operator. The indication can be any desired type of indication, such as a sound like a 
telephone being picked up or some phrase, such as "OK to talk" or "Open", which indicates 
that the call has been successfully initiated. The "OK to talk" message also indicates to the 
source network interface 30a that any further messages are to be communicated to and from 
the audio manager 34a. 

The call managers 32a and 32b periodically send call control signals indicating 
that their audio transceiver is currently active. The managers 32a and 32b monitor the flow 
of these control signals and also provide "end of conversation' indications. These can come 
as commands from the respective operators or after a predetermined length of time during 
which no control signal was received from the destination transceiver 20b. 

The session managers 33 provide overall control to the elements of each audio 
transceiver 20. In particular, they manage the logical level of the session with the remote 
party. 
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The audio managers 34a and 34b process the digital audio data received from 
their respective network interfaces 30 and from their respective audio devices 14. The audio 
managers 34 are cfivided into aucfio transmitters 35 (detailed In Fig. 3) and audio receivers 
(detailed in Figs. 4 and 5). 

5 During a conversation, the transmitting audio manager 34a receives the audio 

datastream from its corresponcfing aucfio device 14 and processes the datastream to remove 
any silent parts. The resultant datastream is provided to the network interface 30a which 
divides the datastream Into packets and adds network information, such as source and 
destination workstation addresses, to each packet The packets are then sent to the network 

10 12. 

The receiving network interface 30b receives the packets and strips them of 
the network information, producing thereby an audio datastream. The receiving aucfio 
manager 34b processes the datastream in order to ensure that the playback buffers, labeled 
15a and 15b, of their respective aucfio devices 14 have enough digital audio data to play, 

15 irrespective of a) the rate at which the packets arrive, b) the sampling 

device 14a or c) the time at which the packets were originally produced 

If desired, the aucfio manager 34a can compress the audio datastream prior 
to sending it to the network interface 30a to form Into packets. The compression (and 
decompression on the reception side) can be Implemented using any suitable aucfio 

20 compression/decompression technique, such as the Adaptive Delay Pulse Code Modulation 
(ADPCM) technique described In the CCtTT G.721 standard. 

Fig. 3, to which reference Is now made, illustrate the elements and operation 
of the audio transmitter 35 of one aucfio manager 34. The audio transmitter 35 comprises a 
generally lossless sound detector, formed of a voice operated transmitter (VOX) 40, a buffer 

25 42 and switch means 44. The sound detector removes any silent periods and enables the 
users to speak without having to incficate when he is finished speaking (i.e. so that the other 
person can begin speaking). 

It is noted that people do not talk continuously but rather talk in bursts, known 
as •speechbursts". The sound detector determines when the audio datastream includes- a 

30 speechburst (as opposed to background noise) and shifts the datastream to account for the 
processing time of the VOX 40. Thus, the datastream which the VOX 40 processes is also 
stored in the buffer 42 whose length is generally related to the processing time of the VOX 
40. Once the VOX 40 detects a significant sound within the datastream (which typically 
occurs near but not at the beginning of a speechburst), it indicates to the switch means 44 
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to output the data stored in the buffer 42. If no sound was detected, the data stored in buffer 
42 are overwritten. 

In particular, the VOX 40 considers sound to be present as soon as some data 
within the buffer 42 is above a typically, but not necessarily, user-adjustable, sound threshold 
level. The entire contents of the buffer 42 (the datapolnt above the sound threshold level plus 
all of the data before it), are output to the network interface 33 for division into packets. 

When the buffer has had no datapolrrts above a silence threshold level, which 
is typically lower than the sound threshold level, for a few milliseconds (le. the speechburst 
or conversation has ended), the VOX 40 Indicates to the switch means 44 that to disconnect 
the buffer 42 from the network interface 33. 

Reference is now made to Fig. 4 which illustrates the elements of the audio 
receiver 37 of one audio manager 34. Audio receiver 37 comprises a packet handier 50, an 
Initial fullness setting unit 51, a fullness adjuster 52, and switching means (noted by switches 
54) for switching between the units 51 and 52. The output of audio receiver 37 is provided 
to the playback buffer 15 of the audio device 14. 

It is noted that, since people speak in speechbursts, once a speechburst has 
ended, the playback buffer 15 will have nothing left to play. Thus, the fullness setting unit 51 
is activated at the beginning of each speechburst 

It is also noted that the playback buffer 15 is a first-in, first-out (FIFO) buffer 
which, when requested by the audio device 14, provides the audio device with the oldest 
audio data stored therein. There is a minimum level of fullness, which varies with the type 
of audio device 14 utilized, below which the playback buffer 15 should not go, except if the 
speechburst has ended. 

The packet handler 50 receives the audio datastream and the sequence 
number of each packet from the corresponding network interface 30 and notes the sequence 
number of the packet It is noted that each packet stores a plurality of "frames" of audio data 
and that each frame of audio data can be of any length and can include compressed or 
uncompressed data in it 

The packet handler 50 resamples the audio data to match the converting 
frequency of Its corresponding audio device 14. as described in more detail hereinbelow. 
Packet handler 50 also compensates for missing packets by utilizing the packets before and 
after the missing packets and, if necessary, by adding frames of silence. Frames of silence 
are frames with silence sounds in them. 
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Since the network routes e&ch packet separately, the packets do rat, 
necessarily, arrive in order or at a regular rate. At the beginning of a speechburst, this "jitter" 
in the arrival rate can be extremely problematic. Therefore, the fullness setting unit 51 
determines a desired fullness level to overcome most of the jitter and provides the playback 

5 buffer 1 5 with a block of silence data to fill the playback buffer 1 5 to the desired fullness level. 
The fullness unit 51 indicates to the adjuster 52 what the fullness level is, after which, the 
switch means 54 switch control to the fullness adjuster 52. 

While the audio device 14 (splaying the silence block, the fullness adjuster 52 
handles the incoming audio data and provides them to the playback buffer 15. The fullness 

10 adjuster 52 adds or removes audio data In Oder to match the playback rate of the audio 
device 14 with the rate at which the converted audio data is present In other words, the 
packet handler 50 generally converts, or scales, the aiKfio datastream to the converting rate 
of the audio device 14 of the destination workstation and the fullness adjuster 52 performs 
fine adjustments to the data rate of the incoming datastream to more accurately match the 

15 converting rate of the audio device 14. To do so, the fullness adjuster 52 adjusts the desired 
fullness level. 

If the audio device 14 plays aB the data in its buffer 15, either before or when 
the speechburst ends, typically due to increased Jitter on the network, the switch means 54 
switches control to the fullness setting unit 51 which slightly increases the fullness level to 

20 compensate for the increased jitter and provides another silence block of the size of the 
increased fullness level. 

Fullness adjuster 52 processes the audio data to ensure that the playback 
buffer 15 is as full as necessary but not overly full, since the more data stored In the playback 
buffer 15, the longer it takes before the operator hears the received data. Fullness adjuster 

25 52 adjusts the rate of the audio data so as to generally match the playback rate of the aucfio 
device 14. Thus, if the playback rate is faster than that of the converted audio data, fullness 
adjuster 52 adds extra samples to the audio data every so many samples. Conversely, if the 
playback rate is slower than the rate of converted data, fullness adjuster 52 drops every so 
many audio samples. 

30 It will be appreciated that the packet handler 50 and the fullness adjuster 52 

not only compensate for mismatches between the playback rate and the rate at which the 
network transfers packets, but also compensate for differences between the packet creation 
rate of the source audio device 14a and that of the playback rate of the destination audio 
device 14b. Thus, the aucfio transceiver of the present invention enables communication 
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between Pes having aucfo devices by (Efferent manufacturers, which typicalty do not have 
similar sampling and playback rates. Similarly, the present invention enables a single aucfio 
device manufacturer to produce aucfio devices whose sampling rates range within a large 
tolerance range. 

It will further be appreciated that the fullness setting unit 51 and the fullness 
adjuster 52 operate to maintain the playback buffer 15 full, without any knowledge of when 
the incoming audio data were originally produced. 

Reference Is now matte to Fig. 5 which illustrates the operation of the receiver 
37, for each incoming packet in flow chart format 

When a packet arrives (step 100), its datastream is first resampled (step 102) 
to match the converting frequency of the destination aucfio device 14. The resampling 
procedure can be any resampling procedure which performs anti-alias filtering, interpolation 
and decimation. The method utilized by the CAT aucfio device, commercially available from 
the common assignees of the present invention, is suitable and is operative on PCs. Other 
methods of resampling are also known. 

Afterwards, the sequence number of the new packet is compared to that of the 
previously received packet If there is a gap between the two sequence numbers (step 104), 
the gap is filled (step 106). One method for filling the gap Is as follows: The frames 
bordering each side of the gap are duplicated and the remaining frames of the missing packet 
or packets are filled with silence. Thus, if the two received packets have frames P1 v P2 t P3 
and P4, P5, P6 in order, and one packet is missing, the resultant series will be: P1, P2, P3, 
P3, silence, P4, P4, P5, P6. Other methods of filling the gap are also possible. 

Steps 100 - 106 form the operations of packet handler 50. 

Whether or not a packet was missing, in step 108, the receiver 37 determines 
the current amount of data DATA_AMOUNT stored in the playback buffer. The current 
amount of data DATA_AMOU NT indicates the delay between the arrival of an aucfio sample 
and the time it is played out by the aucfio device 15 and is defined as the difference between 
the amount of data sent for use by the playback buffer 15 and the amount of data which the 
playback buffer 15 utilized. In equation format 

DATA.AMOUNT = AMOUNT_SENT - AMOUNT_RETURNED (1) 
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Typically, the amount sent and amount returned are continually calculated over the course 
of a speechburst DATA-AMOUNT is calculated over a moving window of time (of typically 
2 seconds) and thus, is an average, rather than an instantaneous value. 

In step 110, the receiver 37 determines if the current amount of data 
5 DATA_AMOUNT is 0 (i.e. the playback buffer 15 is empty). If it is more than 0 f step 114 is 
performed Otherwise, step 112 is performed. 

If, despite the operation of the fullness adjuster 52, the playback buffer 15 was 
emptied, incficates that the network jitter has gotten worse or that the speechburst has ended. 
To compensate for the possible increased fltter, the fullness setting unit 51 increases the 
10 fullness level by a predefined amount, such as by 10%. At the same time, the fullness setting 
unit 51, in step 112, sends a silence block of at least the size of the current desired fullness 
level 

If, alternatively, the playback buffer was not empty, the fullness adjuster 52 is 
operative. In step 1 14, it determines whether or not the current desired fullness level is too 
15 large and in step 116, it adjusts the data to be sent to the playback buffer 15 in order to 
achieve the desired fullness level. 

The current desired fullness level is increased (in step 112), whenever the 
playback buffer 15 approaches empty and is decreased (in step 114) whenever there were 
no gaps during the last predetermined length of time, such as for 10 seconds. If the minimum 
20 current amount of data DATA_AMOUNT for the last, say 10 seconds, is larger than the 
minimum allowed for the specific aucfio device 14, then the desired fullness level is set to the 
mean value between the minimum allowed amount of data (for the specific aucfio device 14) 
and the minimum DATA-AMOUNT for the last, say, 10 seconds. 

It will be appreciated that any other function to reduce the desired fullness level 
25 which reduces the level without causing a gap to occur, is also suitable. 

In step 116, the difference between the current amount of data 
DATA_AMOUNT and the desired fullness level is determined. The difference can be 
measured in seconds of data or in numbers of blocks of data. The data to be sent to the 
playback buffer 15 is then processed to force the difference to be as close to zero as 
30 possible. 

The processing involves adding or removing audio samples as a function of 
how large the difference is. A positive difference (i.e. the current amount of data is larger 
than the desired fullness level), indicates that the input rate is higher than the playback rate. 
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Therefore, some of the audio samples should be removed A negative difference requires 
the addition of audio samples. 

For each type of audio device, the receiver 37 has a LookUp Table (LUT) 
defining the function for adding and removing. The following table is useful for the 
Sourafclaster audio devices: 



1 Difference (in msec) 


Duplication amount 


1 500 


-1 per 20 = -5% 


1 300 


-1 per 50 = -2% 


150 


-1 per 100 = -1% 


-150 


+1 per 100 = +1% | 





+1 per 50 = +2% | 


| 500 


+1 per 20 = +5% J 



where +1 and -1 Indicate addition and removal of a frame and "per X" indicates for every X 
frames. 

When there are frames which have been artificially added due to a missing 
packet (In step 106), then the arBficiaJJy added frame is selected as the one to be removed. 
When a frame is to be added, the frame which is added is a copy of the frame which will be 
next to it. 

Although not shown In Fig. 5, the particular duplication amount is maintained 
until the difference is dose to zero. At that point, the duplication amount can be changed. 
Furthermore, when duplication or removal is occurring, the window size for determining the 
current amount of data is reduced, for example to 500 msec. 

Once the processing has finished, the data of the packet are sent (step 118) 
to the playback buffer 15 and the process repeated for the next packet 

The following pseudo code details the operation of the dupHcation/removaJ 
mechanism of step 116: 

Pseudocode for Step 116: 

PDup - Previous Duplication_Amount 
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DupIicatk>n_Amount 

(Di4riicatkm.Amoimt<0, means deletion). 
(Duplication J\nrKxinfr>O l means duplication). 

DatoL.Amount-(averaged over 2 sec.) 
Short_DatfiL_Amount-(averaged over 500 ms.) 

Fullness Jjevel 
Difference Jn^Amount 

Init: 

Duplication_Amounk=0; 
PDup=0; 

When packet arrives: 

(Reevaluate Duplication_Amount) 

Difference Jn_Amount = Data_Amount - Fultness_Level 

Short JDfffeiBnce_in_Amount = ShorL.Dlfference jn.Amount - Fullness .Level 

Duplteatlon_Amount = find (Difference Jn.Amount) in table. 

(Stop DupTDel Process) 
if (PDup! = 0) { 

if (PDup < 0 && Short.DifferenceJn_Amount <0) 

PDup = 0; (Stop Deletion) 
if (PDup > 0 && ShortJ)ifferenceJn_Amount > 0) 
PDup = 0; (Stop Duplication) 
if ([Duplication J^mountl > IPdupl) 
PDup = Duplication_Amount; 
if difference increases then increase 
DupJDel.accordingty) 

} 

if (PDup=0){ 

Duplication_Amount = find(Differenc*jn_Arnount) in table. 

PDup = Duplication_Amount 

} 
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It will be appreciated by persons skilled in the art that the present invention is 
not limited to what has been particularly shown and described hereinabove. Rather the scope 
of the present invention is defined by the claims which follow: 
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CLAIMS 



1. An aucfio transceiver between an audio device of a personal computer and a 
packet data network, the audio transceiver comprising: 
5 a. means for receiving a stream of sequence stamped aucfio packets from 

said packet data network; 

b. fullness setting means for transferring at least a silence frame to a 
playback buffer of said aucfio device whenever said playback buffer is 
empty in order to fill said playback buffer to an adjustable fullness level; 

10 and 

c. fullness adjusting means for controlling the fullness level of the playback 
buffer, said fullness adjusting means operating to duplicate or remove at 
least one frame until a difference between a current fullness level and 
the desired fullness level is substantially zero. 

15 2. A transceiver according to claim 1 and also comprising: 

a. means for sequence- and destination-stamping ail of said aucfio packets; 
and 

b. means for transmitting said audio packets via said network. 

3. A transceiver according to any of claims 1 - 2 and also comprising voice 
20 detection means which stores at least a portion of said received audio packets 

and sends after said sound threshold level is reached said stored audio 
packets with audio packets received after said sound threshold level is reached. 

4. A transceiver according to any of claims 1 - 3 and wherein said packet data 
network is a private network. 

25 5. A transceiver according to any of claims 1 - 3 and wherein said packet data 

network is a public network. 

6. A transceiver according to any of the previous claims and wherein said fullness 
setting means comprises means for increasing said adjustable fullness level 
and wherein said fullness adjusting means includes means for decreasing said 
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7. 



adjustable fullness level and means for processing audio data within said audio 
packets in order to fin said playback buffer to the current value of said 
adjustable fullness level. 

A transceiver according to claim 6 and wherein said means for processing 
comprises means for determining the amount of data in said playback buffer 
during a predetermined window of time. 

8. A transceiver according to any of claims 6 - 7 and wherein said means for 
processing includes means for adding and removing portions of said audio data 
as a function of whether or not the current amount of data is less or more than 
said current value of said adjustable fullness level. 

9. A transceiver according to claim 8 and wherein said means for adding and 
removing comprise means for maintaining the size of said portions of audio 
data until said current amount of data reaches said current value of said 
adjustable fullness level. 

10. A transceiver according to the previous claims wherein said means for receiving 
further comprising means for duplicating at least one frame of two packets 
bordering a gap, said gap detected whenever the sequence number of said two 
adjacent packets is not consecutive. 

11. A transceiver according to claim 10 wherein said means for receiving further 
comprises inserting at least one silence frame between said duplicated frames. 

12. A transceiver according to any of claims 10 -1 1 wherein said fullness adjusting 
means remove at least one frame of the frames duplicated by said means for 
receiving due to said gap. 

13. A transceiver according to any of claims 10 - 12 wherein said fullness adjusting 
means duplicate at least one frame of the frames not duplicated by said means 
for receiving due to said gap. 
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14. A transceiver according to any of claims 10 - 12 wherein said adjustable 
fullness level is determined in accordance with the variations in size of said 
playback buffer. 

15. A method for transmitting and receiving audio data between an audio device 
of a personal computer and a packet data network, the method comprising the 
steps of: 

a. receiving a stream of sequence-stamped audio packets from said packet 
data network; 

b. transferring at least a silence frame to a playback buffer of said audio 
device whenever raid playback buffer is empty in order to fill said 
playback buffer to an adjustable fullness level; and 

c. controlling the fullness level of the playback buffer, said controlling 
comprising duplicating or removing at least one frame until a difference 
between a current fullness level and the desired fullness level is 
substantially zero. 

16. A method according to daim 15 and also comprising the steps of: 

a. sequence- and destination-stamping all of said audio packets; and 

b. transmitting said audio packets via said network. 

17. A method according to any of claims 15 - 16 and also comprising the steps of 
storing at least a portion of audio packets received before a sound threshold 
level is reached and sending after said sound threshold level is reached said 
stored audio packets with audio packets received after said sound threshold 
level is reached. 

18. A method according to any of claims 10 - 17 and wherein said packet data 
network is a private network. 

19. A method accorcfing to any of claims 10-17 and wherein said packet data 
network is a public network. 
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A method according to any of claims 10 - 19 and wherein said step of 
transferring includes the step of increasing said adjustable fullness level and 
wherein the step of controlling Includes the steps of decreasing said adjustable 
fullness level and processing audio data within said audio packets in order to 
fill said playback buffer to the current value of said adjustable fullness level. 

A method according to claim 20 and wherein said step of processing includes 
the step of determining the amount of data in said playback buffer during a 
predetermined window of time. 

A method according to any of claims 20 • 21 and wherein said step of 
processing includes the step of adding and removing portions of said audio 
data as a function of whether or not the current amount of data is less or more 
than said current value of said adjustable fullness level. 

A method according to claim 22 and wherein said step of adding and removing 
includes the step of maintaining the size of said portions of audio data until said 
current amount of data reaches said current value of said adjustable fullness 
level. 

A method according to any of claims 20 - 23 wherein said receiving further 
comprising duplicating at least one frame of two packets bordering a gap, said 
gap detected whenever the sequence number of said two adjacent packets is 
not consecutive. 

A method according to claim 24 wherein said receiving further comprises 
inserting at least one silence frame between said duplicated frames. 

A method according to any of claims 24 - 25 wherein said controlling comprises 
removing at least one frame of the frames duplicated in said step of receiving 
due to said gap. 

A method according to any of claims 24 - 26 wherein said controlling comprises 
duplicating at least one frame of the frames not duplicated by said means for 
receiving due to said gap. 



20 



WO 96/15598 



PCT/US95rt4123 



28. A method according to any of claims 20 - 27 wherein said controlled fullness 
level is determined in accordance with the variations in size of said playback 
buffer. 

29. An audio transceiver between an audio device of a personal computer and a 
S packet data network, the audio transceiver comprising: 

a. means for receiving a stream of sequence stamped audio packets from 
said packet data network; and 

b. means for duplicating at least one frame of two received packets 
bordering a gap, said gap detected whenever the sequence number of 

!0 said two adjacent received packets is not consecutive. 

30. An audio transceiver according to claim 29 also comprising means for inserting 
at least one silence frame between said duplicated frames. 

31. A method for transmitting data between an audio device of a personal 
computer and a packet data network, comprising: 

15 a. receiving a stream of sequence stamped audio packets from said packet 

data network; 

b. duplicating frames of two received packets bordering a gap, said gap 
detected whenever the sequence number of said two adjacent received 
packets is not consecutive. 

20 32. A method according to claim 31 further comprising inserting at least one silence 

frame between said duplicated frames. 

33. An audio transceiver between an audio device of a personal computer and a 
packet data network, the audio transceiver comprising: 

a. means for receiving a stream of audio packets from said packet data 
25 network; and 

b. voice detection means which stores at least a portion of audio packets 
received before a sound threshold level is reached and which sends after 
said sound threshold is reached stored aucfio packets with audio packets 
received after said threshold value is reached. 
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34. A method for transmitting and receiving aucfio data between an aticfio device 
of a personal computer and a packet data network, the method comprising the 
steps of: 

a receiving a stream of audio packets from said packet data network; 

b. storing at least a portion of audio packets received before a sound 
threshold level is reached; and 

c. sending after said sound threshold level is reached said stored aucfio 
packets with aucfio packets received after said sound threshold level is 
reached. 
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